Automatic generation of pronunciation lexicons for Mandarin spontaneous speech

نویسندگان

  • William J. Byrne
  • Veera Venkataramani
  • Terri Kamm
  • Thomas Fang Zheng
  • Zhanjiang Song
  • Pascale Fung
  • Yi Liu
  • Umar Ruhi
چکیده

Pronunciation modeling for large vocabulary speech recognition attempts to improve recognition accuracy by identifying and modeling pronunciations that are not in the ASR systems pronunciation lexicon. Pronunciation variability in spontaneous Mandarin is studied using the newly created CASS corpus of phonetically annotated spontaneous speech. Pronunciation modeling techniques developed for English are applied to this corpus to train pronunciation models which are then used for Mandarin Broadcast News transcription.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Pronunciation Modeling for Spontaneous Mandarin Speech Recognition

Pronunciation variations in spontaneous speech can be classified into complete changes and partial changes. A complete change is the replacement of a canonical phoneme by another alternative phone, such as ‘b’ being pronounced as ‘p’. Partial changes are variations within the phoneme such as nasalization, centralization and voiced. Most current work in pronunciation modeling for spontaneous Man...

متن کامل

Multiword expressions in spontaneous speech: do we really speak like that?

In this study, we examined the pronunciation characteristics of multiword expressions (MWEs). We first drew up an inventory of frequently occurring N-grams extracted from orthographic transcriptions of spontaneous speech contained in a large corpus of spoken Dutch. For about 10% of these Ngrams phonetic transcriptions were available, which were examined. Our results show that the pronunciation ...

متن کامل

Modelling pronunciation variations in spontaneous Mandarin speech

Pronunciation in spontaneous Mandarin speech tends to be much more variable than in read speech. In current recognition systems, pronunciation dictionaries usually only contain one standard pronunciation for each word, so that the amount of variability that can be modelled is very limited. Most recent research work for modelling variations in spontaneous speech focuses on the lexicon level, whi...

متن کامل

Use of Graphemic Lexicons for Spoken Language Assessment

Automatic systems for practice and exams are essential to support the growing worldwide demand for learning English as an additional language. Assessment of spontaneous spoken English is, however, currently limited in scope due to the difficulty of achieving sufficient automatic speech recognition (ASR) accuracy. ”Off-the-shelf” English ASR systems cannot model the exceptionally wide variety of...

متن کامل

Pronunciation Learning from Continuous Speech

This paper explores the use of continuous speech data to learn stochastic lexicons. Building on previous work in which we augmented graphones with acoustic examples of isolated words, we extend our pronunciation mixture model framework to two domains containing spontaneous speech: a weather information retrieval spoken dialogue system and the academic lectures domain. We find that our learned l...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2001